Overview

Dataset statistics

Number of variables16
Number of observations8693
Missing cells200
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory128.0 B

Variable types

Categorical8
Numeric8

Alerts

PassengerId has a high cardinality: 8693 distinct valuesHigh cardinality
Name has a high cardinality: 8473 distinct valuesHigh cardinality
Cabin is highly overall correlated with HomePlanet and 1 other fieldsHigh correlation
Cabin_num is highly overall correlated with HomePlanet and 1 other fieldsHigh correlation
HomePlanet is highly overall correlated with Cabin and 2 other fieldsHigh correlation
CryoSleep is highly overall correlated with TransportedHigh correlation
RoomService is highly overall correlated with CryoSleepHigh correlation
FoodCourt is highly overall correlated with CryoSleepHigh correlation
ShoppingMall is highly overall correlated with CryoSleepHigh correlation
Spa is highly overall correlated with CryoSleepHigh correlation
VRDeck is highly overall correlated with CryoSleepHigh correlation
Destination is highly overall correlated with HomePlanetHigh correlation
Transported is highly overall correlated with CryoSleepHigh correlation
Name has 200 (2.3%) missing valuesMissing
PassengerId is uniformly distributedUniform
Name is uniformly distributedUniform
PassengerId has unique valuesUnique
Cabin has 256 (2.9%) zerosZeros
Age has 178 (2.0%) zerosZeros
RoomService has 5577 (64.2%) zerosZeros
FoodCourt has 5471 (62.9%) zerosZeros
ShoppingMall has 5683 (65.4%) zerosZeros
Spa has 5324 (61.2%) zerosZeros
VRDeck has 5497 (63.2%) zerosZeros

Reproduction

Analysis started2022-12-02 01:19:44.043068
Analysis finished2022-12-02 01:20:03.762876
Duration19.72 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

PassengerId
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct8693
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
0001_01
 
1
7999_01
 
1
8014_01
 
1
8012_03
 
1
8012_02
 
1
Other values (8688)
8688 

Length

Max length7
Median length7
Mean length7
Min length7

Characters and Unicode

Total characters60851
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8693 ?
Unique (%)100.0%

Sample

1st row0001_01
2nd row0002_01
3rd row0003_01
4th row0003_02
5th row0004_01

Common Values

ValueCountFrequency (%)
0001_01 1
 
< 0.1%
7999_01 1
 
< 0.1%
8014_01 1
 
< 0.1%
8012_03 1
 
< 0.1%
8012_02 1
 
< 0.1%
8012_01 1
 
< 0.1%
8010_01 1
 
< 0.1%
8007_02 1
 
< 0.1%
8007_01 1
 
< 0.1%
8006_01 1
 
< 0.1%
Other values (8683) 8683
99.9%

Length

2022-12-02T10:20:03.840874image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0001_01 1
 
< 0.1%
0044_01 1
 
< 0.1%
0003_01 1
 
< 0.1%
0003_02 1
 
< 0.1%
0004_01 1
 
< 0.1%
0005_01 1
 
< 0.1%
0006_01 1
 
< 0.1%
0007_01 1
 
< 0.1%
0008_01 1
 
< 0.1%
0008_03 1
 
< 0.1%
Other values (8683) 8683
99.9%

Most occurring characters

ValueCountFrequency (%)
0 12459
20.5%
1 9827
16.1%
_ 8693
14.3%
2 5017
8.2%
3 4039
 
6.6%
4 3790
 
6.2%
6 3664
 
6.0%
5 3606
 
5.9%
8 3557
 
5.8%
7 3410
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 52158
85.7%
Connector Punctuation 8693
 
14.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 12459
23.9%
1 9827
18.8%
2 5017
9.6%
3 4039
 
7.7%
4 3790
 
7.3%
6 3664
 
7.0%
5 3606
 
6.9%
8 3557
 
6.8%
7 3410
 
6.5%
9 2789
 
5.3%
Connector Punctuation
ValueCountFrequency (%)
_ 8693
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 60851
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 12459
20.5%
1 9827
16.1%
_ 8693
14.3%
2 5017
8.2%
3 4039
 
6.6%
4 3790
 
6.2%
6 3664
 
6.0%
5 3606
 
5.9%
8 3557
 
5.8%
7 3410
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 60851
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 12459
20.5%
1 9827
16.1%
_ 8693
14.3%
2 5017
8.2%
3 4039
 
6.6%
4 3790
 
6.2%
6 3664
 
6.0%
5 3606
 
5.9%
8 3557
 
5.8%
7 3410
 
5.6%

HomePlanet
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
0
4714 
1
2175 
2
1804 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

Length

2022-12-02T10:20:03.924875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:04.031877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

Most occurring characters

ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4714
54.2%
1 2175
25.0%
2 1804
 
20.8%

CryoSleep
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
0
5568 
1
3125 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Length

2022-12-02T10:20:04.118876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:04.209875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Most occurring characters

ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 5568
64.1%
1 3125
35.9%

Cabin
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.3010468
Minimum0
Maximum7
Zeros256
Zeros (%)2.9%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:04.288876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q36
95-th percentile6
Maximum7
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.7602883
Coefficient of variation (CV)0.40926974
Kurtosis-0.30516724
Mean4.3010468
Median Absolute Deviation (MAD)1
Skewness-0.94842497
Sum37389
Variance3.0986149
MonotonicityNot monotonic
2022-12-02T10:20:04.364877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
5 2821
32.5%
6 2564
29.5%
4 1025
 
11.8%
1 779
 
9.0%
2 747
 
8.6%
3 495
 
5.7%
0 256
 
2.9%
7 6
 
0.1%
ValueCountFrequency (%)
0 256
 
2.9%
1 779
 
9.0%
2 747
 
8.6%
3 495
 
5.7%
4 1025
 
11.8%
5 2821
32.5%
6 2564
29.5%
7 6
 
0.1%
ValueCountFrequency (%)
7 6
 
0.1%
6 2564
29.5%
5 2821
32.5%
4 1025
 
11.8%
3 495
 
5.7%
2 747
 
8.6%
1 779
 
9.0%
0 256
 
2.9%

Cabin_num
Real number (ℝ)

Distinct1817
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean591.52905
Minimum0
Maximum1894
Zeros22
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:04.471951image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30
Q1166
median407
Q3983
95-th percentile1561
Maximum1894
Range1894
Interquartile range (IQR)817

Descriptive statistics

Standard deviation509.49978
Coefficient of variation (CV)0.86132673
Kurtosis-0.66160158
Mean591.52905
Median Absolute Deviation (MAD)313
Skewness0.75153112
Sum5142162
Variance259590.03
MonotonicityNot monotonic
2022-12-02T10:20:04.589982image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
289 94
 
1.1%
82 28
 
0.3%
21 27
 
0.3%
90 23
 
0.3%
0 22
 
0.3%
86 22
 
0.3%
19 22
 
0.3%
176 21
 
0.2%
97 21
 
0.2%
56 21
 
0.2%
Other values (1807) 8392
96.5%
ValueCountFrequency (%)
0 22
0.3%
1 15
0.2%
2 11
0.1%
3 16
0.2%
4 7
 
0.1%
5 13
0.1%
6 12
0.1%
7 9
0.1%
8 13
0.1%
9 16
0.2%
ValueCountFrequency (%)
1894 1
< 0.1%
1893 1
< 0.1%
1892 1
< 0.1%
1891 1
< 0.1%
1888 2
< 0.1%
1886 1
< 0.1%
1884 1
< 0.1%
1880 1
< 0.1%
1878 1
< 0.1%
1877 1
< 0.1%

Cabin_port
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
0
4403 
1
4290 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Length

2022-12-02T10:20:04.701204image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:04.850137image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Most occurring characters

ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 4403
50.6%
1 4290
49.4%

Destination
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
2
6082 
0
1815 
1
796 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Length

2022-12-02T10:20:04.973139image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:05.079137image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Most occurring characters

ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 6082
70.0%
0 1815
 
20.9%
1 796
 
9.2%

Age
Real number (ℝ)

Distinct80
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.58921
Minimum0
Maximum79
Zeros178
Zeros (%)2.0%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:05.193137image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4
Q119
median27
Q337
95-th percentile55
Maximum79
Range79
Interquartile range (IQR)18

Descriptive statistics

Standard deviation14.451078
Coefficient of variation (CV)0.50547314
Kurtosis0.11527462
Mean28.58921
Median Absolute Deviation (MAD)9
Skewness0.44745822
Sum248526
Variance208.83365
MonotonicityNot monotonic
2022-12-02T10:20:05.334651image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 363
 
4.2%
21 328
 
3.8%
24 324
 
3.7%
19 317
 
3.6%
22 294
 
3.4%
23 292
 
3.4%
20 284
 
3.3%
26 268
 
3.1%
28 267
 
3.1%
27 260
 
3.0%
Other values (70) 5696
65.5%
ValueCountFrequency (%)
0 178
2.0%
1 67
 
0.8%
2 75
0.9%
3 75
0.9%
4 71
 
0.8%
5 34
 
0.4%
6 40
 
0.5%
7 53
 
0.6%
8 50
 
0.6%
9 44
 
0.5%
ValueCountFrequency (%)
79 3
 
< 0.1%
78 3
 
< 0.1%
77 2
 
< 0.1%
76 2
 
< 0.1%
75 4
< 0.1%
74 5
0.1%
73 7
0.1%
72 4
< 0.1%
71 7
0.1%
70 9
0.1%

VIP
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
0
8494 
1
 
199

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

Length

2022-12-02T10:20:05.455162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:05.553162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

Most occurring characters

ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 8494
97.7%
1 199
 
2.3%

RoomService
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1293
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean245.71023
Minimum0
Maximum14327
Zeros5577
Zeros (%)64.2%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:05.651161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q377
95-th percentile1419.4
Maximum14327
Range14327
Interquartile range (IQR)77

Descriptive statistics

Standard deviation683.58887
Coefficient of variation (CV)2.7820937
Kurtosis57.242125
Mean245.71023
Median Absolute Deviation (MAD)0
Skewness5.8374018
Sum2135959
Variance467293.74
MonotonicityNot monotonic
2022-12-02T10:20:05.779164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5577
64.2%
1 117
 
1.3%
2 79
 
0.9%
3 61
 
0.7%
4 47
 
0.5%
485 44
 
0.5%
5 28
 
0.3%
9 25
 
0.3%
6 24
 
0.3%
8 24
 
0.3%
Other values (1283) 2667
30.7%
ValueCountFrequency (%)
0 5577
64.2%
1 117
 
1.3%
2 79
 
0.9%
3 61
 
0.7%
4 47
 
0.5%
5 28
 
0.3%
6 24
 
0.3%
7 17
 
0.2%
8 24
 
0.3%
9 25
 
0.3%
ValueCountFrequency (%)
14327 1
< 0.1%
9920 1
< 0.1%
8586 1
< 0.1%
8243 1
< 0.1%
8209 1
< 0.1%
8168 1
< 0.1%
8151 1
< 0.1%
8142 1
< 0.1%
8030 1
< 0.1%
7406 1
< 0.1%

FoodCourt
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1526
Distinct (%)17.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean459.71506
Minimum0
Maximum29813
Zeros5471
Zeros (%)62.9%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:05.917161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3109
95-th percentile2669.4
Maximum29813
Range29813
Interquartile range (IQR)109

Descriptive statistics

Standard deviation1595.7785
Coefficient of variation (CV)3.471234
Kurtosis74.656669
Mean459.71506
Median Absolute Deviation (MAD)0
Skewness7.1580172
Sum3996303
Variance2546509.2
MonotonicityNot monotonic
2022-12-02T10:20:06.042162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5471
62.9%
1 116
 
1.3%
2 75
 
0.9%
3 53
 
0.6%
4 53
 
0.6%
575 39
 
0.4%
5 33
 
0.4%
6 31
 
0.4%
9 28
 
0.3%
7 27
 
0.3%
Other values (1516) 2767
31.8%
ValueCountFrequency (%)
0 5471
62.9%
1 116
 
1.3%
2 75
 
0.9%
3 53
 
0.6%
4 53
 
0.6%
5 33
 
0.4%
6 31
 
0.4%
7 27
 
0.3%
8 20
 
0.2%
9 28
 
0.3%
ValueCountFrequency (%)
29813 1
< 0.1%
27723 1
< 0.1%
27071 1
< 0.1%
26830 1
< 0.1%
21066 1
< 0.1%
18481 1
< 0.1%
17958 1
< 0.1%
17901 1
< 0.1%
17687 1
< 0.1%
17432 1
< 0.1%

ShoppingMall
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1124
Distinct (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean175.52203
Minimum0
Maximum23492
Zeros5683
Zeros (%)65.4%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:06.182163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q332
95-th percentile912.4
Maximum23492
Range23492
Interquartile range (IQR)32

Descriptive statistics

Standard deviation599.18999
Coefficient of variation (CV)3.4137595
Kurtosis332.84866
Mean175.52203
Median Absolute Deviation (MAD)0
Skewness12.663393
Sum1525813
Variance359028.64
MonotonicityNot monotonic
2022-12-02T10:20:06.497161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5683
65.4%
1 153
 
1.8%
2 80
 
0.9%
3 59
 
0.7%
4 45
 
0.5%
5 38
 
0.4%
7 36
 
0.4%
6 34
 
0.4%
13 29
 
0.3%
8 28
 
0.3%
Other values (1114) 2508
28.9%
ValueCountFrequency (%)
0 5683
65.4%
1 153
 
1.8%
2 80
 
0.9%
3 59
 
0.7%
4 45
 
0.5%
5 38
 
0.4%
6 34
 
0.4%
7 36
 
0.4%
8 28
 
0.3%
9 28
 
0.3%
ValueCountFrequency (%)
23492 1
< 0.1%
12253 1
< 0.1%
10705 1
< 0.1%
10424 1
< 0.1%
9058 1
< 0.1%
7810 1
< 0.1%
7185 1
< 0.1%
7148 1
< 0.1%
7104 1
< 0.1%
6805 1
< 0.1%

Spa
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1378
Distinct (%)15.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean357.46555
Minimum0
Maximum22408
Zeros5324
Zeros (%)61.2%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:06.621168image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q389
95-th percentile2168.2
Maximum22408
Range22408
Interquartile range (IQR)89

Descriptive statistics

Standard deviation1177.323
Coefficient of variation (CV)3.2935287
Kurtosis68.15581
Mean357.46555
Median Absolute Deviation (MAD)0
Skewness6.8335306
Sum3107448
Variance1386089.5
MonotonicityNot monotonic
2022-12-02T10:20:06.774164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5324
61.2%
1 146
 
1.7%
2 105
 
1.2%
3 53
 
0.6%
5 53
 
0.6%
4 46
 
0.5%
2938 36
 
0.4%
7 34
 
0.4%
6 33
 
0.4%
8 28
 
0.3%
Other values (1368) 2835
32.6%
ValueCountFrequency (%)
0 5324
61.2%
1 146
 
1.7%
2 105
 
1.2%
3 53
 
0.6%
4 46
 
0.5%
5 53
 
0.6%
6 33
 
0.4%
7 34
 
0.4%
8 28
 
0.3%
9 28
 
0.3%
ValueCountFrequency (%)
22408 1
< 0.1%
18572 1
< 0.1%
16594 1
< 0.1%
16139 1
< 0.1%
15586 1
< 0.1%
15331 1
< 0.1%
15238 1
< 0.1%
14970 1
< 0.1%
13995 1
< 0.1%
13902 1
< 0.1%

VRDeck
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct1330
Distinct (%)15.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean322.14817
Minimum0
Maximum24133
Zeros5497
Zeros (%)63.2%
Negative0
Negative (%)0.0%
Memory size68.0 KiB
2022-12-02T10:20:06.896164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q367
95-th percentile1657
Maximum24133
Range24133
Interquartile range (IQR)67

Descriptive statistics

Standard deviation1150.9937
Coefficient of variation (CV)3.5728705
Kurtosis82.193077
Mean322.14817
Median Absolute Deviation (MAD)0
Skewness7.5665315
Sum2800434
Variance1324786.4
MonotonicityNot monotonic
2022-12-02T10:20:07.025163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 5497
63.2%
1 139
 
1.6%
2 70
 
0.8%
3 56
 
0.6%
5 51
 
0.6%
148 48
 
0.6%
4 47
 
0.5%
6 32
 
0.4%
8 30
 
0.3%
7 29
 
0.3%
Other values (1320) 2694
31.0%
ValueCountFrequency (%)
0 5497
63.2%
1 139
 
1.6%
2 70
 
0.8%
3 56
 
0.6%
4 47
 
0.5%
5 51
 
0.6%
6 32
 
0.4%
7 29
 
0.3%
8 30
 
0.3%
9 25
 
0.3%
ValueCountFrequency (%)
24133 1
< 0.1%
20336 1
< 0.1%
17306 1
< 0.1%
17074 1
< 0.1%
16337 1
< 0.1%
14485 1
< 0.1%
12708 1
< 0.1%
12685 1
< 0.1%
12682 1
< 0.1%
12424 1
< 0.1%

Name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct8473
Distinct (%)99.8%
Missing200
Missing (%)2.3%
Memory size68.0 KiB
Anton Woody
 
2
Cuses Pread
 
2
Ankalik Nateansive
 
2
Grake Porki
 
2
Carry Contrevins
 
2
Other values (8468)
8483 

Length

Max length18
Median length15
Mean length13.833628
Min length7

Characters and Unicode

Total characters117489
Distinct characters53
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8453 ?
Unique (%)99.5%

Sample

1st rowMaham Ofracculy
2nd rowJuanna Vines
3rd rowAltark Susent
4th rowSolam Susent
5th rowWilly Santantines

Common Values

ValueCountFrequency (%)
Anton Woody 2
 
< 0.1%
Cuses Pread 2
 
< 0.1%
Ankalik Nateansive 2
 
< 0.1%
Grake Porki 2
 
< 0.1%
Carry Contrevins 2
 
< 0.1%
Sus Coolez 2
 
< 0.1%
Troya Schwardson 2
 
< 0.1%
Apix Wala 2
 
< 0.1%
Elaney Webstephrey 2
 
< 0.1%
Sharie Gallenry 2
 
< 0.1%
Other values (8463) 8473
97.5%
(Missing) 200
 
2.3%

Length

2022-12-02T10:20:07.145161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
willy 20
 
0.1%
casonston 18
 
0.1%
oneiles 16
 
0.1%
domington 15
 
0.1%
litthews 15
 
0.1%
browlerson 14
 
0.1%
fulloydez 14
 
0.1%
garnes 14
 
0.1%
cartez 14
 
0.1%
idace 13
 
0.1%
Other values (4880) 16833
99.1%

Most occurring characters

ValueCountFrequency (%)
e 12691
 
10.8%
a 10251
 
8.7%
n 9155
 
7.8%
8493
 
7.2%
r 7707
 
6.6%
o 6563
 
5.6%
i 6456
 
5.5%
l 6231
 
5.3%
s 5299
 
4.5%
t 4552
 
3.9%
Other values (43) 40091
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 92010
78.3%
Uppercase Letter 16986
 
14.5%
Space Separator 8493
 
7.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 12691
13.8%
a 10251
11.1%
n 9155
10.0%
r 7707
8.4%
o 6563
 
7.1%
i 6456
 
7.0%
l 6231
 
6.8%
s 5299
 
5.8%
t 4552
 
4.9%
y 4093
 
4.4%
Other values (17) 19012
20.7%
Uppercase Letter
ValueCountFrequency (%)
S 1530
 
9.0%
C 1499
 
8.8%
B 1412
 
8.3%
M 1261
 
7.4%
A 1194
 
7.0%
P 987
 
5.8%
H 911
 
5.4%
G 848
 
5.0%
D 809
 
4.8%
W 742
 
4.4%
Other values (15) 5793
34.1%
Space Separator
ValueCountFrequency (%)
8493
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 108996
92.8%
Common 8493
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 12691
 
11.6%
a 10251
 
9.4%
n 9155
 
8.4%
r 7707
 
7.1%
o 6563
 
6.0%
i 6456
 
5.9%
l 6231
 
5.7%
s 5299
 
4.9%
t 4552
 
4.2%
y 4093
 
3.8%
Other values (42) 35998
33.0%
Common
ValueCountFrequency (%)
8493
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 117401
99.9%
None 88
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 12691
 
10.8%
a 10251
 
8.7%
n 9155
 
7.8%
8493
 
7.2%
r 7707
 
6.6%
o 6563
 
5.6%
i 6456
 
5.5%
l 6231
 
5.3%
s 5299
 
4.5%
t 4552
 
3.9%
Other values (42) 40003
34.1%
None
ValueCountFrequency (%)
é 88
100.0%

Transported
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size68.0 KiB
1
4378 
0
4315 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters8693
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Length

2022-12-02T10:20:07.246164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2022-12-02T10:20:07.349161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Most occurring characters

ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 8693
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Most occurring scripts

ValueCountFrequency (%)
Common 8693
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8693
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 4378
50.4%
0 4315
49.6%

Interactions

2022-12-02T10:20:02.424876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:55.726579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.836581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.719580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.788640image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.656637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.542910image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.407877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.526877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.111580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.947580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.826580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.901637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.763636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.643939image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.516881image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.631876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.218579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.059581image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.077608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.013639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.875638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.745950image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.623876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.747878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.326579image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.172578image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.198608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.130638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.992639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.851850image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.738878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.853877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.433580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.283580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.312607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.241637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.131637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.957366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.856877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.958875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.536580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.389582image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.422607image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.350637image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.239639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.060875image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.144878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:03.056877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.634580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.504580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.533608image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.451636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.347638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.167877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.239877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:03.156877image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:56.739580image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:57.613582image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:58.669609image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:19:59.550638image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:00.440639image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:01.271876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-12-02T10:20:02.330878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-12-02T10:20:07.476163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2022-12-02T10:20:07.674165image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-12-02T10:20:07.865164image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-12-02T10:20:08.066163image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-12-02T10:20:08.260162image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-12-02T10:20:08.382161image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-12-02T10:20:03.326878image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-12-02T10:20:03.537876image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

PassengerIdHomePlanetCryoSleepCabinCabin_numCabin_portDestinationAgeVIPRoomServiceFoodCourtShoppingMallSpaVRDeckNameTransported
00001_0110100239000000Maham Ofracculy0
10002_0100501224010992554944Juanna Vines1
20003_011000125814335760671549Altark Susent0
30003_02100012330012833713329193Solam Susent0
40004_01005112160303701515652Willy Santantines1
50005_01005001440048302910Sandie Hinetthews1
60006_01005212260421539300Billex Jacostaffey1
70007_010053123500785172160Andona Beston1
80008_0111110014000000Erraiam Flatic1
90008_03101100450397295589110124Wezena Flatic1
PassengerIdHomePlanetCryoSleepCabinCabin_numCabin_portDestinationAgeVIPRoomServiceFoodCourtShoppingMallSpaVRDeckNameTransported
86835474_010142750221000000Thew Strony1
86840693_01114140035000028530Mothab Dedometeel1
86850278_0100411023500008882648Judya Beachez0
86864637_01204221022406720501034Tark Ches0
86874974_02114263004500000148Lesat Vendeck1
86888772_0210390005300112703939400Naosura Motled0
86893821_01004309013500200867Violan Mcphernard0
86907746_011142890035000000Antinon Patoetic1
86914167_0100430901330044000334Ninaha Deckerson0
86922970_0100542022707408266281Dwin Adkinson0